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ABSTRACT 

We compare numerically the effectiveness of three recently proposed algorithms, mul- 
ticanonical simulations, simulations in a 1/ fc-ensemble, and simulated tempering, for the 
protein folding problem. For this we perform simulations with high statistics for one of 
the simplest peptides, Met-enkephalin. While the performances of all three approaches is 
much better than traditional methods, we find that the differences among the three are 
only marginal. 



Key Words: Monte Carlo, generalized ensemble, protein folding, multiple-minima prob- 
lem, global energy minimization 



1 e-mail: hansmann@ims.ac.jp 

2 e-mail: okamotoy@ims.ac.jp 



INTRODUCTION 

While there has been a considerable progress in numerical simulations of biological macro- 
molecules over the last thirty years (for a recent review see, for instance, Ref. [0J), the 
prediction of their low-energy conformations from the first principles remains a formidable 
task. The numerical calculation of physical quantities depend on how comprehensively 
the phase space of the system is explored. However, the energy landscape of the system of 
peptides and proteins is characterized by a multitude of local minima separated by high 
energy barriers. At low temperatures traditional Monte Carlo and molecular dynamics 
simulations tend to get trapped in one of these local minima. Hence, only small parts of the 
phase space are sampled (in a finite number of simulation steps) and physical quantities 
cannot be calculated accurately. A common approach to alleviate this multiple-minima 
problem is to identify the global minimum in the free energy (which should corrospond 
to the native conformation of a protein) with the lowest potential energy conformation, 
ignoring entropic contributions, and to search for this conformation with powerful opti- 
mization techniques such as Monte Carlo with minimization, genetic algorithms, fU 
or simulated annealing. 0] Since simulated annealing was first introduced to the protein 
folding problem, 0-[f§ the effectiveness was tested with many peptides and proteins, and 
it is now one of the most popular optimization methods used in the field (for a recent 
review see Ref. @). 

It is claimed that a variety of recently proposed methods like the multicanonical 
approach, [110, |TTf entropic sampling, El simulated tempering, |jl3|, |14|| and simulations in a 
so-called 1 / fc-ensemble[|l5H allow a much better sampling of the phase space than previous 
methods. Actually, two of these methods, multicanonical approach and entropic sampling, 
are mathematically identical as shown in Ref. [IB|. The multicanonical algorithm was first 
applied to the protein folding problem in Ref. [[17]]. In Ref. |18|] the authors showed that 
this new ansatz yields indeed to an improvement over simulated annealing. The method 



was also applied to the studies of the coil-gobular transitions of a model protein JT9| 
and the helix-coil transitions of homo-oligomers of nonpolar amino acids |2(J. Simulated 
tempering was applied to the study of a model heteropolymer. IpTR 

Applications of simulated tempering and the 1/k — ensemble to the protein folding 
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problems are still missing. Also missing is a quantitative comparison of the effectiveness 
of these three new methods. In the present article we try to fill in this gap. By simulating 
one of the simplest peptides, Met-enkephalin, with high statistics, we study the numerical 
performances of the three methods. In particular, the efficiency of finding the global- 
minimum-energy conformation and calculating thermodynamic quantities is studied in 
detail. 

METHODS 
Algorithms 

Although the algorithms are explained in the original papers, we briefly summarize here 
the ideas and implementations of the three approaches for completeness. 

Simulations in the canonical ensemble weight each configuration with the Boltzmann 
factor wb(T, E) = e~ l3E and yield the usual bell-shaped probability distribution of energy 
at temperature T: 

P B (T,E) oc n(E)w B (T,E) = n(E) e^ E , (1) 

where n(E) is the density of states (spectral density) and (3 = 1/ksT is the inverse 
temperature (we set the Boltzmann constant k B to unity hereafter). 

Using local updates, the probability for the system to cross an energy barrier is pro- 
portional to e _/3A£; in the canonical ensemble. Here, AE is the height of the energy 
barrier. Hence, at low temperatures (large (3) simulations will easily get trapped in one 
of the local minima and very long simulations are necessary to calculate thermodynamic 
quantities accurately. In general, one can think of two strategies to alleviate this problem. 
First, one can look for improved updating scheme. For instance, a kind of global updating 
scheme called the cluster update method |22|, |23| is popular in spin systems. No such 
global updates are known for proteins. Secondly, one can perform the simulation in a 
" generalized ensemble" , where the above difficulty does not arise. The latter approach is 
the one that is investigated in the present article. 

In the multicanonical approach [II], Ft] configurations with energy E are updated with 



a weight: 

W m u{E) OC TrvT = B~ 8 ^ , (2) 

n{E) 



where 

S(E)=\ogn(E) (3) 

is the microcanonical entropy. A simulation with this weight factor will produce a uniform 
distribution of energy, 

P mu {E) oc n{E) w mu (E) = const , (4) 

resulting in a free random walk in the energy space. This allows the simulation to escape 
from any energy barrier, and even regions with small n(E) can be explored in detail. 

Unlike in a canonical simulation the weight w mu (E) is not a priori known (in fact, 
the knowledge of the exact weight is equivalent to obtaining the density of states n(E), 
i.e., solving the system), and one needs its estimator for a numerical simulation. Hence, 
the multicanonical ansatz consists of three steps: In the first step the estimator of the 
multicanonical weight factor w mu (E) is calculated. Then one performs with this weight 
a multicanonical simulation with high statistics. In this way information is collected over 
the whole energy range. We remark that there is no explicit temperature dependence 
in simulations of the multicanonical ensemble. However, from this simulation one can 
not only locate the energy global minimum but also obtain the canonical distribution 
at any inverse temperature ft = ^ for a wide range of temperatures by the re-weighting 
techniques: |]24[ 

P B (T,E) <x P mu (E) w-l(E) e-? E . (5) 

This allows one to calculate the expectation value of any physical quantity O at temper- 
ature T by 

/ dE 0(E)P B (T,E) 

<0> T = J — r . (6) 

J dE P B (T,E) 

The crucial step is the calculation of the estimator for the multicanonical weight 
factor w mu {E). It is obtained by an iterative procedure. The improved estimator of the 
multicanonical weight for the i-th iteration is calculated from the histogram of energy 
distribution P^~ l \E) and the weight w^ 1 ' of the preceeding simulation as follows: 



pL 1 \e) 



^u(E) = (7) 
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The first iteration is a canonical simulation at sufficiently high temperature Tq = -J-, with 



w 



(E) = e /3 ° E . For details, see Refs. and |2(|. The method for calculating the 



multicanonical weight factor is by no means unique. While it is quite general, it has the 
disadvantage that it requires iterations of short simulations, the number of which is not 
known a priori. 

Performing simulations in a so-called 1/k-ensemble is a way of sampling the (micro- 
canonical) entropy S uniformly: 

P l/k (S) = const . (8) 

This equation implies that 

P 1/k (E) = P 1/k (S)^ cx ^ = ?(E) , (9) 
where $(E) stands for the effective inverse temperature^ and is defined by 

(3(E) = J- = dl ° g ^ E) . (10) 
hy ' T(E) dE v ; 

There is again no explicit temperature dependence in simulations of the 1/k ensemble; it 

can be said that simulations in both multicanonical and 1/k ensembles include information 

of all temperatures. If the simulation is restricted to a certain energy interval, Eq. (0) 

defines the corresponding temperature range (assuming n(E) is a monotonous function 

of E) over which reliable results can be obtained. 

In their original paper, Hesselbo and Stinchcombe [JiJ proposed that configurations 

are assigned a weight 

= ip, • (id 

where the function k(E) is defined as the integral of density of states with respect to 
energy E: 

k{E) = [ E dE' n(E') . (12) 



This implies that 



. - . . n(E) d\ogk(E) 

P llk (E) S n(E )m/i (E) = ^ = . (13) 



3 We remark that the term "multicanonical algorithm" was inspired by this effective temperature. 
In the early work, Berg and co-workers used for the multicanonical weight the specific parametrization 
exp(— j3(E) E — a(E)), approximating S(E) — logn(E) by straight lines between adjacent energy bins. 



Since the density of states n(E) is a rapidly increasing function of energy, we have 
logk(E) « logn(-B) for wide range of values of E. Hence, Eqs. @ and ( ]T3|) are equiv- 
alent, and a random walk in the entropy space is realized. Since the entropy S(E) is a 
monotonically increasing function of energy, a random walk in entropy implies a random 
walk in energy space (with more weight towards low-energy region; compare Eqs. (fj) and 
fllTD ). Hence, a simulation in 1 //c-ensemble can escape from any energy barrier. Hesselbo 
and Stinchcombe claim that the l//c-sampling is superior to the multicanonical algorithm 
in the sense that a fewer number of sampling is necessary for the former. fll"5fl 

In numerical work, energy is discretized and integrals are replaced by sums. Again, 
the weight wuk(E) is not a priori known and its estimator has to be calculated. It is 
obvious that the method described above for determination of the multicanonical weight 
factor is also suited for the calculation of the weight in the l//c-ensemble. In addition, it 
follows from the definitions of the weights, Eqs. (J|), fllTD, and (0), that one can calculate 
W\/k{E) from w mu (E), and vice versa. Thermodynamic quantities at any temperature 
can be calculated by Eq. ([]) with the re- weighting techniques of Eq. (f|), in which P mu (E) 
and w mu (E) are replaced by Pi/ k (E) and Wi/k(E), respectively. 

The third approach is simulated tempering, which was first introduced under the name 
method of expanded ensembles by Luyabartsev et al. JTS], but became more popular under 



the former name proposed by Marinari and Parisi in Ref. |T4[. In this ansatz temperature 
itself becomes a dynamical variable. Temperature and configuration are both updated 
with a weight: 

w ST {T,E) = e- E / T -^ , (14) 

where the function g(T) is chosen so that the probability distribution of temperature is 
given by 

P ST (T) = J dE n(E) e~ E/T - g{T) = const . (15) 

Hence, in simulated tempering the temperature is sampled uniformly, while simulations 
in multicanonical and l/k ensembles respectively sample energy and entropy uniformly. 
A random walk in temperature space is realized, allowing the simulation to escape from 
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any energy barrier. The last equation, Eq. ( jig) , implies that 

e 9{T) oc J dE n(E) e~ E/T . (16) 

The function g(T) is therefore proportional to the logarithm of the canonical partition 
function at temperature T. Again, the weight wst{T, E) is not a priori known and 
its estimator has to be calculated. We remark, however, that Eq. (|i~6|) allows easily 
the calculation of the function g(T) and therefore of the weight wst{T, E), once the 
multicanonical weight w mu (E) = n^i^E) is given. Likewise, the multicanonical weight 
w mu (E) can in principle be calculated from g(T) by the inverse Laplace transformation. 

In the numerical work the temperature is discretized and restricted to a certain inter- 
val [T min ,T max }. Integrals are replaced by sums. We also found it convenient to choose 
the temperature points T not equidistant, but so that the increment of adjacent temper- 
ature points decreases exponentially with decreasing temperature. Of course, this implies 
that we do not sample in a uniform way in temperature but by a monotone function of 
temperature. 

Given the parameters g^ = g(Ti) a simulated tempering simulation may use convential 



Metropolis algorithm [25| with two kinds of Monte Carlo updates: 



• Updates of configurations (conformations) at a fixed temperature Tj with probability 
mm{l,exp[-(E new - E i d )/T n ]}. 

• Updates of temperatures with a fixed configuration (conformation) with probability 
min{l,exp[-T(l/T new - l/T old ) - (g(T new ) - g{T M )))}. 

Using these updates, one can perform a simulated tempering simulation. 

Crucial for this method is again the determination of the parameters gi (i = 1, ■ ■ ■ ,n). 
Given the temperature points T (i — 1, • • • , n with T\ = T max > Ti > T n = T min ), gi can 
be calculated by the following iterative procedure: 

1. Start with a short canonical simulation of m\ MC sweeps updating only configura- 
tions at temperature T± = T max and calculate the average energy < E >t x ■ Formally, 
this can be regarded as a simulated tempering simulation at a fixed temperature Tl 
with weight wst(T, E) = e~ E ^ Tl ~ gi and g\ = 0. 



2. Calculate new parameters gj according to: 

{gj + log(mj-) , 1 < j < i 

gj+<E> Ti ^_L_I^ , j = i + l (17) 
oo , j > i + 1 

3. Start a new simulation, now updating both configurations and temperatures, with 
weight wst(T, E) = e~ E ^ Tj ~ 9i and sample the distribution of temperatures Tj in the 
histogram rrij = m(Tj). For T = T i+1 calculate the average energy < E >r i+1 - 

4. If the histogram rrij is approximately flat in the temperature range T\ > Tj > T i+1 , 
set i = % + 1. Otherwise, leave i unchanged. 

5. Iterate the last three steps until the obtained temperature distribution rrij becomes 
flat over the whole temperature range [T n = T min , T\ = T max }. 

Once the weight factor wst{T,E) is obtained, we make a production run with high 
statistics. Physical quantities have to be sampled for each temperature point separately. 
Their expectation values at temperature T are then calculated in the usual way by 



< O > T 



J dx 0{x) e~ E ^l T 
J dx e- E ^' T 



where x labels the conformations, and only those conformations that were obtained at 
temperature T are included in the integral. Expectation values at intermediate tempera- 
tures can be calculated by the reweighting techniques. ||24|| 
Peptide Preparation and Potential Energy Function 

Met-enkephalin has the amino-acid sequence Tyr-Gly-Gly-Phe-Met. For our simulations 
the backbone was terminated by a neutral NH2- group at the N-terminus and a neutral - 
COOH group at the C-terminus as in the previous works of Met-enkephalin. 0, ^, [26|, |27|, 



Hfl The potential energy function E tot that we used is given by the sum of the electrostatic 
term E es , 12-6 Lennard- Jones term E v dw, and hydrogen-bond term Ehb for all pairs of 
atoms in the peptide together with the torsion term E t0 rs for all torsion angles: 

Etot = E es + E vdW + E^ + E tors , (19) 
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es 



332q i q j 



(20) 




(21) 



(22) 



E 



'tors 



Y,Ui{l± cos(njXi)) > 



(23) 



where is the distance between the atoms i and j, and xi is t ne ^-th torsion angle. 
This E to t was used in the actual simulations of the three algorithms. The parameters 
(qi, Aij, Bij, Cij, Dy, Ui and n{) for the energy function were adopted from ECEPP/2. ||28|]- 
p0| The computer code KONF90 was used. The peptide-bond dihedral angles ui were 
fixed at the value 180° for simplicity, which leaves 19 angles fa, ipi, and x% as independent 
variables. We remark that KONF90 uses a different convention for the implementation of 
the ECEPP parameters (for example, fa of ECEPP/2 is equal to fa - 180° of KONF90). 
Therefore our energy values are slightly different from those of the original implementation 
of ECEPP/2. 
Computational Details 

Preliminary runs showed that all methods need roughly the same amount of CPU time 
for a fixed number of MC sweeps (about 15 minutes for 10,000 sweeps on an IBM RS6000 
320H). Hence, we compared the different methods by performing simulations with the 
same number of total MC sweeps. By setting this number to 1,000,000 sweeps we tried 
to ensure high statistics. One MC sweep updates every torsion angle of the peptide once. 

In the case of simulated tempering we chose 30 temperature points which were expo- 
nentially distributed between T min = 50 K and T max = 1000 K, i.e., Tj = T max x 7^ _1 ) 
(i — 1, • • • , 30) with 7 = (Tmin/Tmax) 1 / 29 . This specific choice of temperature points was 
made because we found in preliminary runs that for an equidistant distribution of temper- 
atures we either needed a very large number of intermediate temperature points or could 
not obtain reliable estimates for = g(Tj). Even with the above choice of temperature 
points we needed 150,000 sweeps to calculate the simulated tempering parameters gi. 

In our earlier work we found that we needed 40,000 sweeps to calculate the mul- 
ticanonical weight for Met-enkephalin. To better compare the different methods we tried 
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to improve the weights by further iterations till the total number of sweeps was again 
150,000. Although the weight for l//c-ensemble can also be determined by the iterative 
procedure, here we just calculated it from the obtained multicanonical weight by means 
of Eq. fljjP to save computation time. 

All thermodynamic quantities were then calculated from a production run of 1,000,000 
MC sweeps for each of the three methods which followed 10,000 sweeps for thermalization. 
At the end of every second sweep we stored the energy of the actual conformation for 
future analysis of thermodynamic quantities. In all cases, the simulations started from 
completely random initial conformations ("Hot Start"). 
RESULTS AND DISCUSSION 

We start the Results section by demonstrating some of the basic features of the three 
methods. As explained above, we expect for all three algorithms that simulations result 
in a (weighted) random walk in energy space. In Fig. 1 we show the time series of energy 
from the production runs of 1,000,000 MC sweeps for the three methods. They all exhibit 
a random walk between low energy states and high energy states. In Ref. [27] it was shown 
that with the energy parameters of KONF90, conformations with potential energies less 
than —11 kcal/mol essentially have the same structure (with small deviations in dihedral 
angles), which is the conformation with the global- minimum energy. The random walks 
in Fig. 1 all reached this lowest-energy state many times. The numbers of independent 
such visits will be examined in detail below. 

By comparing Eqs. (|2|) and (|11]), one expects that the random walk in 1/ A;-ensemble 
is supposed to be biased towards the low-energy region, while that in multicanonical 
ensemble does not have any bias (free random walk). This tendency is apparent in the 
random walks in Fig. 1; that of 1/fc-ensemble spends more time in the low-energy region. 

In Fig. 2 histograms of energy distribution for the three methods are shown. As 
expected, the distribution is essentially flat for multicanonical ensemble, it increases with 
decreasing energy for 1/fc-ensemble. As implied by Eq. (0), P\/k{E) x T(E) is a flat 
curve (data not shown). Here, T(E) is the effective temperature defined in Eq. (|10|). 
The distribution of energies for simulated tempering given in Fig. 2 is also flat, but 
has the tendency to increase as energy decreases. The flat distribution does not imply 
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uniform sampling of temperature, since our temperature points are not equidistant. For 
equidistant temperature points we would expect a distribution which is proportional to 
the reciprocal of the specific heat. 

In 1 / /c-ensemble a free random walk in entropy space is expected to be realized (see 
Eq. (§)). In Fig. 3a the time series of the function S(E) = logn(E) is shown. A random 
walk between small S and large S is indeed observed. In Fig. 3b we display the probability 
distribution of entropy for both simulations of multicanonical and 1/k ensembles. This 
should be compared with Fig. 2, where we displayed histograms of energy distribution. 
The multicanonical simulation yields a curve which increases with entropy, while in the 
1/k ensemble we observe a flat distribution, indicating that entropy was indeed uniformly 
sampled. 

In simulated tempering the weight is chosen so that a Id random walk in the temper- 
ature points Ti is obtained (see Eq. (|l5l)). This can be seen in Fig. 4, where we display 
the time series of temperature for simulated tempering. As expected, a random walk 
between high and low temperatures is observed. Since all temperatures should appear 
with the same weight, we expect that the distribution of temperatures is essentially flat. 
The simulation results confirmed this within the deviations of factor 2 from flatness (data 
not shown). As in the case of the multicanonical approach, one has to make a trade-off 
between the numerical effort one is willing to put into determination of weights and the 
deviation from a flat probability distribution one is willing to accept. We allowed for 
differences of less than one order of magnitude. 

A major advantage of all three methods studied in this paper is that they allow 
calculations of thermodynamic quantities at any temperature from just one simulation 
run, once the weights are determined (see Eqs. (|6]) and (0)). As examples we show 
in Fig. 5 the average potential energy < E >t and the specific heat C as functions of 
temperature. The latter quantity is defined by 

C(T) = < E ' >T N - T : E , (24) 

where N(= 5) is the number of amino acids in the peptide. All three methods yield the 
same values at most of the temperature values within the (small) error bars. These results 
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demonstrate that the three methods are equally well suited for calculation of thermody- 
namic quantities. It also proves that our estimates of these quantities are reliable (since 
they were calculated from independent simulation data obtained by different methods; 
there is no systematic hidden bias). This is especially important for the low temperature 
region where comparison with canonical simulation is not possible. Note that the average 
potential energy at the lowest-temperature region is about —12 kcal/mol, which is the 
global-minimum energy value for the energy function in KONF90.[fj7| The value at a high 
temperature, say T = 1000 K, is as large as ~ 16 kcal/mol. The energy fluctuation SE 
at this temperature is about 5 kcal/mol (calculated from the value of C from Eq. fl2"4|) by 



5E = y5C/P 2 ). Hence, the random walks in Fig. 1 have reached the high-energy region 
with energy values that would be obtained at temperatures higher than 1000 K. 

The behavior of the specific heat is also reasonable. The peak atT« 300 K implies 
that this temperature is most relevant for the folding of the peptide. The zero-temperature 
limit of the specific heat agrees with that of the harmonic approximation to the potential 
energy, where equipartition theorem can be used: 

<E>= Np 1 - . (25) 

Here, Np is the number of degrees of freedom (number of torsion angles) and we have set 

again ks = 1- The above specific heat is defined to be the derivative of average energy 

per residue with respect to temperature: 

d «E>/N) = N L 

dT 2N K J 

For Met-enkephalin Np = 19 (number of free torsion angles) and N = 5, and we have 

C — 1.9. The zero-temperature limit extrapolated in Fig. 5 agrees with this value. 

A comparison of Figs. 1 and 4 gives us a way of comparing the performance of the 

three methods in terms of sampling the ground-state conformations. It is evident that 

low energy (temperature) states which are separated in the time series by high energy 

(temperature) states are uncorrelated. The number of such "tunneling" events is therefore 

a lower limit for the number of independent low-energy states visited in the simulation. 

We define a tunneling event as the walk in the simulation between a conformation with 

energy E < —11.0 kcal/mol (i.e., a ground-state conformation) (or T = 50 K in the 
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case of simulated tempering) to a conformation with energy above E = 20.0 kcal/mol 
(or T = 1000 K for simulated tempering) and back to the ground-state region. For each 
tunneling event the lowest energy, E GS , obtained during the corresponding cycle was 
monitored. Table 1 summarizes our results. In our production runs of each 1,000,000 
MC sweeps we observed 23 tunneling events for the multicanonical simulation, 27 for 
the simulation in the l/fc-ensemble, and 19 in the case of simulated tempering. These 
numbers correspond to tunneling times (average number of MC sweeps needed for a 
tunneling event) of 40324, 35664, and 47874 MC sweeps for multicanonical, 1/fc-ensemble, 
and simulated tempering simulations. Note that the number of tunneling events from the 



multicanonical simulation is higher than the one quoted in Ref. |T8|], where we found only 
18 tunneling events and a tunneling time of 54136. We assume that this improvement is 
due to the improvement of the multicanonical weight. It is interesting to observe that the 
number of tunneling events (and therefore independent ground-state conformations found 
in the course of simulation) is larger for the 1/k ensemble than for the multicanonical 
simulation. The observed differences may be due to large statistical fluctuations (see 
the standard deviations of tunneling times in the last row of Table 1). In any case the 
differences in tunneling times which we found are too small to establish a ranking of the 
performances among the three methods. 

We conclude that all three methods, while not differing much from each other, are much 
more efficient in finding independent ground-state structures than traditional methods. 



In Ref. [[18[ we made a comparison of multicanonical algorithms with simulated annealing, 



including annealing versions of the multicanonical simulations. J32[ Multicanonical anneal- 
ing simplifies the determination of weight factors and can be used to search groundstates, 
but does not allow estimation of thermodynamic quantities. Here we also performed 
annealing simulations in l/fc-ensemble and simulated tempering. Twenty independent 
annealing runs with 50,000 MC sweeps were made for the two methods. An annealing 
simulation in 1/ fc-ensemble was done as in the same manner as for multicanonical anneal- 



ing, lyl Namely, an upper bound in energy E wa n is introduced, above which a simulation 
is not allowed to enter. The weight is updated so that the energy distribution becomes 
flat in the interval (E wa u — AE, E wa u), where AE, sampling energy interval, is a constant. 
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The upper bound E wa u is lowered once after each iteration so that E wa u = E + AE, 
where Eq is the lowest energy found in the preceeding iteration. A simulated tempering 
annealing simulation is implemented similarly by replacing energy by temperature in the 
above procedure (for details of the annealing algorithms, see Ref. [p~8j| ) . 



The lowest energies obtained by each annealing run are summarized for the three 
methods and compared with simulated annealing in Table 2, where the values for mul- 



ticanonical annealing and simulated annealing were taken from Ref. |18[ . The results 
are all similar with high probability ( 75 - 90 % ) of finding the energy global mini- 
mum in contrast with the moderate probability (around 40 %) of those by Monte Carlo 
simulated annealing without much fine-tuning of the annealing schedule. |18| To compare 
different annealing schedules we varied the number of independent runs and MC sweeps 
per run keeping their product constant. The results are shown in Table 3, where data 



for multicanonical annealing and simulated annealing were taken from Ref. [|T^]. Again 
our results do not allow a ranking of the generalized ensemble algorithms, but shows that 
they perform better than simulated annealing. 

In order to obtain an optimal annealing schedule, one has to monitor the specific heat. 
One has to lower the temperature very slowly where the specific heat has a peak so that 
a wide variety of conformations are sampled. ^From Fig. 5 one finds that the present 
system of Met-enkephalin has a peak of specific heat at « 300 K. This temperature 
in turn corresponds to the average potential energy of ~ —1 kcal/mol, as is shown in 
Fig. 5. Since the lowest energy is about —12 kcal/mol, we conclude that the sampling 
energy interval AE [|18| for the annealing simulations in multicanonical and 1/k ensembles 
should be more than 11 (—1 — (—12)) kcal/mol. The results in Table 2 were obtained with 
AE = 15 kcal/mol. As for the simulated tempering annealing simulation, we conclude 
that the corresponding sampling temperature interval AT should be at least AT = 250 
(300 — T min ) K. The results in Table 2 were obtained with AT = 300 K. These analyses 
give some hint for the determination of optimal annealing conditions. 

Finally, in Fig. 6 we show the time series of energy from typical annealing runs for 
multicanonical algorithm, l/fc-ensemble, simulated tempering, and simulated annealing. 
From the Figure we can understand why the performances of the former three algorithms 
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were better than a regular simulated annealing with naive exponential annealing schedule 
of Ref. [[31] (without reheating). Namely, as the simulation proceeds, the size of energy 
fluctuations is fixed to be a preset finite value for the former three methods, while it 
decreases towards zero for the latter. Hence, the chance of getting trapped in an energy 
local minima becomes higher and higher as the simulation proceeds for Monte Carlo 
simulated annealing. 
CONCLUSIONS 

We have performed simulations with high statistics for a simple peptide, Met-enkephalin, 
to compare numerically the performances of three new algorithms in the protein folding 
problem. All three methods have in common that simulations are performed in an ex- 
tended ensemble instead of the ususal canonical ensemble. They all require as a first step 
the calculation of estimators for the not a priori known probability weight factors. Using 
an analytical transformation of the obtained distribution to a canonical distribution, any 
thermodynamic quantity at any temperature can in principle be calculated from one sim- 
ulation run. We demonstrated that the calculated thermodynamic quantities over a wide 
range of temperatures were identical for the three methods. In particular, the agreement 
persisted into the difficult, low-temperature regime, where regular canonical simulations 
will necessarily get trapped in one of huge number of energy local minima. Hence, one 
can cross-check the low-temperature results obatined from one of the three methods by 
those from another. Furthermore, by comparing the efficiency of the three methods in 
finding independent ground-state structures, we found that the three methods are equally 
more effective than traditional methods. 

We conclude that all three methods are equally well suited for the simulation of pep- 
tides and proteins. Temperature, energy, or entropy is also by no means the only variables 
in which a uniform sampling is possible, nor is there any restriction on one variable. The 
three studied algorithms are only special cases of a larger class of similar methods, which 
we may call "simulations in generalized ensemble" . 
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TABLE CAPTIONS: 

Table 1: Estimated ground-state energies Eqs (in kcal/mol) of each tunneling event as 
obtained by the three algorithms. t min is the sweep when the simulation first entered the 
groundstate region (E < —11.0 kcal/mol) in the corresponding tunneling event. < t tun > 
is the average time in Monte Carlo sweeps between these tunneling events. The numbers 
in brackets are the standard deviation of the corresponding quantities. 
Table 2: Lowest energy (in kcal/mol) obtained by the 20 annealing runs of the three 
algorithms. For comparison, the results from simulated annealing are also included. For 
all cases, the total number of Monte Carlo sweeps per run was 50,000. ncs is the number 
of runs in which a conformation with E < —11.0 kcal/mol was obtained. The data for 



multicanonical annealing and simulated annealing were taken from Ref. [18]. 
Table 3: Number of runs that reached ground-state conformations for the three algorithms. 
Keeping the total number of MC sweeps constant we varied the number of independent 
runs. For comparison, simulated annealing data are included. The data for multicanoni- 



cal annealing and simulated annealing were taken from Ref. [T8| . 
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Table 2. 
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Table 3. 
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FIGURE CAPTIONS: 

FIG. 1: Time series of energy E (kcal/mol) from a simulation of 1,000,000 MC sweeps by 

multicanonical, 1/ /c-sampling, and simulated tempering algorithms. 

FIG. 2: Probability distribution of energy E (kcal/mol) from simulations by the three 

methods. The data rely on 1,000,000 MC sweeps for each simulation. 

FIG. 3a: Time series of entropy S from a l//c-ensemble simulation of 1,000,000 MC sweeps. 

FIG. 3b: Probability distribution of entropy S from simulations in multicanonical and 

l/k ensembles. The data rely on 1,000,000 MC sweeps for each simulation. 

FIG. 4: Time series of temperature T (K) from a simulated tempering simulation of 

1,000,000 MC sweeps. 

FIG. 5: Average energy < E > T (kcal/mol) and specific heat C as a function of temper- 
ature T (K) calculated from the simulations of multicanonical ensemble, l//c-ensemble, 
and simulated tempering. The data rely on 1,000,000 MC sweeps for each method. The 
energy scale is displayed on the ordinate on the left-hand side and the specific heat on 
the right-hand side. The average energy is a monotonically increasing function of tem- 
perature, whereas the specific heat has a maximum around 300 K. 

FIG. 6: Time series of energy E (kcal/mol) from an annealing simulation of 50,000 MC 
sweeps by multicanonical, l/fc-sampling, simulated tempering, and simulated annealing 
algorithms. 
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